Skip to content

POC For docker compose #46570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Mar 30, 2022

POC using docker compose, which is also used by the arrow project.

The idea here is that we can simply run docker compose build db-testing to build an image with postgres (can later add mysql) and our minimal development requirements then docker compose run --rm db-testing to run relevant tests. This can be done both by a developer as well as on GH actions.

This still needs a bit more work as it currently muddles user permissions on the host when building pandas

cc @jonashaag who looks to have been doing some awesome work on CI lately

@WillAyd WillAyd added the CI Continuous Integration label Mar 30, 2022
FROM python

RUN apt-get update
RUN apt-get install -y postgresql postgresql-contrib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to install & configure postgres in this image instead of using the official postgres image?

services:
    postgres:
        image: postgres
        ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the Postgres image includes Python, but even if it does longer term we may want to parametrize different versions of Python. We can parametrize the base Python image to build off of, but if we wanted to use Postgres as a base image we'd have to set up more layers to have that work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was imagine something closer to

services:
  pandas-testing:
     build: pandas-container.dockerfile
     command: pytest ...
  postgres-db:
    image: postgres
    ports:
      "5432":"5432"
  mysql-db:
    image: mysql
    ports:
       ...
  s3:
    image: motoserver/moto
    ports:
       ...

such postgres is an independently running service that the testing service can talk to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - didn't even know this was possible. Will test it out

@mroeschke
Copy link
Member

I am in favor of having what users tests locally use the same setup as what the CI runs which containers help enforce.

We should keep in mind that Docker is a freemium tool (limited Dockerhub pulls over a period of time albeit maybe hard to hit & some free container registries out there).

I'll be interested to see how we can leverage docker-compose with the CI providers - initially my gut feels it might just be a lateral move - and maintain testing coverage across different platform versions. We're almost at the point where we can say

for platform in [windows, mac, ubuntu]:
    for python_version in [3.8, 3.9, 3.10]:
        test with platform + python_version + all required & optional dependencies

I would suggest the docker-compose POC mirrors the jobs that GHA currently runs for an easier comparison to our current CI setup. So in terms of testing, all tests can be run for a particular platform + Python version.

@WillAyd
Copy link
Member Author

WillAyd commented Mar 30, 2022

Awesome feedback. The one area where docker is definitely an upgrade over current CI is reproducibility as an end user. GHA are good but AFAIK not something you can reproduce locally. I think the DB tests in particular are an area where we struggle with that

In the future there might also be a use case where we create base images for minimum pinned versions and always reference them, rather than building from scratch in GHA every time. This could help reduce build / test times a bit

We already have this with the existing Dockerfile, but if we roll that into compose we can have a consistent way to create a DEV environment (especially for new users) that our current CI can't help with

@jbrockmendel
Copy link
Member

I am in favor of having what users tests locally use the same setup as what the CI runs which containers help enforce.

+1

@WillAyd
Copy link
Member Author

WillAyd commented Apr 1, 2022

So this works now with a simple docker compose up - starts both DB services and runs all the tests

@WillAyd
Copy link
Member Author

WillAyd commented Apr 4, 2022

Punting for now. Can reopen if it becomes more clearly useful

@WillAyd WillAyd closed this Apr 4, 2022
@WillAyd WillAyd deleted the docker-compose branch April 12, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Use docker compose
3 participants